Method for the synchronization of two digital data flows with identical content

ABSTRACT

The invention relates to a method of synchronizing two digital data streams with the same content, the method comprising the steps of: 
         a) generating at given intervals for each of the two digital data streams S 1  and S 2  at least two characteristic numbers expressing at least one parameter characteristic of their content;    b) generating from said numbers points D 1  and D 2  for each of the two streams S 1  and S 2  representing at least one of said characteristic parameters in a space of at least two dimensions, the points D 1  corresponding to the stream S 1  and the points D 2  corresponding to the stream S 2  that are situated in a time period T defining trajectories representative of the data streams S 1  and S 2  to be synchronized;    c) shifting the time periods of duration T assigned to the digital data streams S 1  and S 2  relative to each other by calculating a criterion of superposition of said trajectories having an optimum value representing the required synchronization; d) choosing the shift between the time periods corresponding to said optimum value as a value representative of the synchronization.

The invention relates to a method of synchronizing two digital datastreams with the same content, for example a reference streamtransmitted by a broadcasting system and the received stream, which maybe degraded, the method being usable in particular to evaluatetransmission quality.

BACKGROUND OF THE INVENTION

The introduction of digital technology into the field of broadcastingaudiovisual signals has opened up new prospects and means that users maybe offered more services.

The signals are modified during the various stages of broadcasting thembecause technical constraints imposed, for example in terms of bit rateor bandwidth, cause characteristic deterioration during difficulttransmission conditions.

To be able to provide a quality assured service, it is necessary todevelop tools and instruments for measuring the quality of the signalsand, where applicable, for estimating the magnitude of the deteriorationthat has occurred. Many measuring methods have been developed for thispurpose. Most of them are based on comparing the signal present at theinput of the system under test, which is called the reference signal,with the signal obtained at the output of the system, which is calledthe degraded signal. Certain “reduced reference” methods compare numberscalculated for the reference signal and for the degraded signal insteadof using the signal samples directly. In both cases, in order toevaluate quality by means of a comparison technique, it is necessary tosynchronize the signals in time.

FIG. 1 depicts the general principle of these methods.

Although synchronization of the signals may be easily achieved insimulation or when the system under test is small, for example acoder-decoder (codec), and not geographically distributed, this is notthe case in a complex system, in particular in the situation ofmonitoring a broadcast network. Thus the synchronization step of qualitymeasuring algorithms is often critical.

In addition to applications for measuring quality in a broadcastnetwork, the method described herein is applicable whenever temporalsynchronization between two audio and/or video signals is required, inparticular in the context of a distributed and extended system.

Various techniques may be used to synchronize digital signals in time.The objective is to establish a correspondence between a portion of thedegraded signal S_(D) and a portion of the reference signal S_(R). FIG.2 depicts this in the case of two audio signals. The problem is todetermine a shift DEC that will synchronize the signals.

In the case of an audio signal, the portion (or element) for which acorrespondence has to be established is a time window, i.e. a period ofthe signal with an arbitrary duration T.

The existing methods may be divided into three classes:

-   -   Correlation approach in the time domain: This is the most usual        approach and consists in comparing samples of the two audio        signals S_(R) and S_(D) to be synchronized, based on their        content. Thus the normalized intercorrelation function between        S_(R) and S_(D), for example, looks for the maximum resemblance        over a given time period T, for example plus or minus 60 ms,        i.e. a total period of 120 ms. The accuracy of synchronization        obtained is potentially to the nearest sample.    -   Correlation approach in the time domain using marker signals:        methods that use this principle seek to overcome the necessity        for significant variations in the signal. To this end, a        specific marker signal designed to allow robust synchronization        is inserted into the audio signal S_(R). Thus exactly the same        intercorrelation method may be applied to the marker signals        extracted from the signals S_(R) and S_(D) to be synchronized,        which in theory allows robust synchronization regardless of the        content of the audio signal.

In order to use this method, the marker signal must be inserted in sucha way that the modification of the content of the audio signal is asimperceptible as possible. Several techniques may be used to insertmarker signals or other specific patterns, including “watermarking”.

Synchronization using temporal markers: methods of this class are usableonly if the signals are associated with temporal markers. Thus themethod relies on identifying, for each marker of the reference signal,the nearest marker in the series of markers associated with the degradedsignal.

A powerful signal synchronization method is characterized by acompromise between:

-   -   its accuracy, i.e. the maximum error that occurs on        synchronizing two signals (in particular, the method may be        sensitive to the content of the signals),    -   its calculation complexity, and    -   finally, the volume of data necessary for effecting the        synchronization.

The main drawback of the techniques most usually employed (using thecorrelation approach referred to above) is the calculation power that isnecessary, which becomes very high as the search period T increases (seeFIG. 2). Another major drawback is the necessity for the content toevolve significantly and continuously. Depending on the type of signalsanalyzed, this is not always achieved. The content of the signalstherefore has a direct influence on the performance of the method.Moreover, to utilize this type of approach on complete temporal signals,it is necessary to have both the signals S_(R) and S_(D) available atthe comparison point; this is a very severe constraint that isimpossible to satisfy in some applications, such as monitoring anoperational broadcasting network.

A feature of the second approach (using correlation with marker signals)is the modification of the content of the audio signal resulting frominserting the marker signals, with no guarantee as to how this willimpact on quality; the measurement method therefore influences themeasurement itself. Regardless of the performance achieved in terms ofsynchronizing the two signals, this approach is not always suitable fora real quality evaluation application.

Finally, the major drawback of synchronization using temporal markers isthe necessity to provide the temporal markers. Because the accuracy ofthe temporal markers is not always satisfactory, only a few applicationsare able to use a technique of this kind.

In the context of broadcast network monitoring, and because of themultiple constraints that apply to the signals transported and themultiple equipments the signals pass through (coders, multiplexers,transmultiplexers, decoders, etc.), there is no strict relationshipbetween the audio signals and the temporal markers. Thus this solutiondoes not achieve the necessary accuracy for a quality measuringapplication using a reference.

OBJECTS AND SUMMARY OF THE INVENTION

An object of the present invention is to define a method of achievingsynchronization with a chosen level of accuracy, of lower complexitythan existing methods, and combining the advantages of severalapproaches. “Coarse” synchronization in accordance with the inventiondelimits an error range whose duration is compatible with the subsequentuse of standard “fine” synchronization methods if extreme accuracy isrequired.

The novelty of the proposed method is that it achieves synchronizationon the basis of at least one characteristic parameter that is calculatedfrom the signals S_(D) and S_(R) and defines a multidimensionaltrajectory, from which the synchronization of the signals themselves isdeduced. Because this method uses the temporal content of the signals,the content must vary continuously to ensure optimum synchronization, asin the prior art temporal correlation methods. The advantage of theproposed method is that it achieves correlation using a multidimensionaltrajectory obtained in particular by combining a plurality ofcharacteristic parameters, which makes it more reliable than the priorart methods.

A fundamental advantage of the method proposed by the invention is thatit necessitates only a small quantity of data to achievesynchronization, which is highly beneficial in the context of broadcastnetwork monitoring. In fact, in this context, it is generally notpossible to have the two complete signals S_(R) and S_(D) available atthe same location. Consequently, it is not possible to use the standardtemporal correlation approach. Moreover, in the context of a qualitymeasurement application, the second approach using correlation withmarker signals is not easily applicable because it impacts on thequality of the signals. In contrast to this, the synchronization methodof the invention is compatible with quality measurement techniques basedon comparing parameters calculated from the signals. The datarepresentative of the characteristic parameter(s) is usually conveyed tothe comparison points over a digital link. This digital linkadvantageously uses the same transmission channel as the audio signal;alternatively, a dedicated digital link may be used. In one particularembodiment, used in a quality measurement application, the data used toachieve synchronization is obtained from one or more quality measurementparameters. Moreover, coarse synchronization is obtained from data D1and D2 calculated at intervals of Δ=1024 audio samples. Finesynchronization may be obtained from data D1 calculated at intervals ofΔ=1024 audio samples and data D2 calculated at intervals of r<Δ, forexample r=32 audio samples. Thus in this case the method obtains finesynchronization that is 32 times more accurate than the qualitymeasurement parameter transmission interval.

The method therefore integrates naturally into a digital televisionquality monitoring system in an operational broadcast network. However,it is applicable wherever temporal synchronization between two signalsis required.

Thus the proposed method achieves synchronization with an accuracy thatmay be chosen to obtain a very small range of uncertainty. Itadvantageously uses at least some of the parameters already calculatedto evaluate the quality of the signal. The ability to start from anextended search period is also beneficial, especially as the robustnessof synchronization increases with the duration of the starting period.

The proposed method therefore does not impose the use of temporalmarkers external to the audio signals. The signal to be synchronizeddoes not need to be modified either, which is important in a qualitymeasurement application.

Thus the invention provides a method of synchronizing two digital datastreams with the same content, the method comprising the steps of:

-   -   a) generating at given intervals for each of the two digital        data streams S₁ and S₂ at least two characteristic numbers        expressing at least one parameter characteristic of their        content;    -   b) generating from said numbers points D₁ and D₂ associated with        each of said streams and representing at least one of said        characteristic parameters in a space of at least two dimensions,        the points D₁ and the points D₂ that are situated in a time        period T defining trajectories representative of the data        streams S₁ and S₂ to be synchronized;    -   c) shifting the time periods of duration T assigned to the        digital data streams S₁ and S₂ relative to each other by        calculating a criterion of superposition of said trajectories        having an optimum value representing the required        synchronization;    -   d) choosing the shift between the time periods corresponding to        said optimum value as a value representative of the        synchronization.

Advantageously in the method, one of the digital data streams is areference stream S₁, the other data stream is a stream S₂ received via atransmission system, the numbers characteristic of the reference streamS₁ are transmitted therewith, and the numbers characteristic of thereceived stream S₂ are calculated in the receiver.

In a first variant of the method, the step c) entails:

-   -   c1) calculating a distance D between a first trajectory        represented by the points D₁ belonging to a first time period of        duration T and a second trajectory represented by the points D₂        belonging to a second time period of duration T, said distance D        constituting said superposition criterion; and    -   c2) shifting said first and second time periods of duration T        relative to each other until a minimum value is obtained for the        distance D that constitutes said The distance D may an        arithmetic mean of the distances d, for example the Euclidean        distances, between corresponding points D₁, D₂ of the two        trajectories.

In a second variant of the method, the step c) entails:

-   -   c1) calculating a correlation function between corresponding        points D₁, D₂ on the two trajectories, said correlation function        constituting said superposition criterion; and    -   c2) shifting said first and second time periods of duration T        relative to each other until a minimum value of the correlation        function is obtained that constitutes said optimum value.

In a third variant of the method, the step c) entails:

-   -   c1) converting each trajectory into a series of angles between        successive segments defined by the points of the trajectory; and    -   c2) shifting said first and second time periods of duration T        relative to each other until a minimum value is obtained for the        differences between the values of angles obtained for homologous        segments of the two trajectories, said minimum value        constituting said optimum value.

In the method, the step c) may entail:

-   -   c1) converting the two trajectories into a series of areas        intercepted by successive segments defined by the points of said        trajectories, the total intercepted area constituting said        superposition criterion; and    -   c2) shifting the time periods of duration T relative to each        other until a minimum value is obtained of said total        intercepted area, which minimum value constitutes said optimum        value.

To make synchronization more accurate, one of said given intervals maybe equal to Δ for one of the data streams and equal to r<Δ for the otherdata stream.

In the method, the generation of said characteristic numbers for areference audio data stream and for a transmitted audio data stream maycomprise the following steps:

-   -   a) calculating for each time window the spectral power density        of the audio stream and applying to it a filter representative        of the attenuation of the inner and middle ear to obtain a        filtered spectral density;    -   b) calculating individual excitations from the filtered spectral        density using the frequency spreading function in the basilar        scale;    -   c) determining the compressed loudness from said individual        excitations using a function modeling the non-linear frequency        sensitivity of the ear, to obtain basilar components; and    -   d) separating the basilar components into n classes, for example        where n≦5, and preferably into three classes, and calculating        for each class a number C representing the sum of the        frequencies of that class, the characteristic numbers consisting        of the numbers C. Alternatively there are n′<n characteristic        numbers generated from said numbers C. The value chosen for n is        much lower than the number of samples, for example 0.01 times        that number.

In the method, the generation of a characteristic number for a referenceaudio data stream and for a transmitted audio data stream comprises thefollowing steps:

-   -   a) calculating N coefficients of a prediction filter by        autoregressive modeling; and    -   b) determining in each temporal window the maximum value of the        residue as the difference between the signal predicted by means        of the prediction filter and the audio signal, said maximum        prediction residue value constituting one of said characteristic        numbers.

In the method, the generation of said characteristic numbers for areference audio data stream and for a transmitted audio data streamcomprises the following steps:

-   -   a) calculating for each time window the spectral power density        of the audio stream and applying to it a filter representative        of the attenuation of the inner and middle ear to obtain a        frequency spreading function in the basilar scale;    -   b) calculating individual excitations from the frequency        spreading function in the basilar scale;    -   c) obtaining the compressed loudness from said individual        excitations using a function modeling the non-linear frequency        sensitivity of the ear, to obtain basilar components;    -   d) calculating from said basilar components N′ prediction        coefficients of a prediction filter by autoregressive modeling;        and    -   e) generating at least one characteristic number for each time        window from at least one of the N′ prediction coefficients.

The characteristic numbers may consist of 1 to 10 of said predictioncoefficients and preferably 2 to 5 of said coefficients.

One characteristic number for an audio signal may be the instantaneouspower and/or the spectral power density and/or the bandwidth.

One characteristic number for a video signal may be the continuouscoefficient of the transformation by a linear and orthogonal transformof at least one portion of an image belonging to the data stream, saidtransformation being effected by blocks or globally, and/or the contrastof at least one area of the image, and/or the spatial activity SA of atleast one area of an image or its temporal activity (defined bycomparison with a previous image), and/or the average brightness of atleast one area of an image.

The points may be generated from at least two characteristic numbersobtained from a single characteristic parameter.

Alternatively, the points may be generated from at least twocharacteristic numbers obtained from at least two characteristic audioand/or video parameters.

In the method, the data stream comprises video data and audio data andthe method effects firstly video synchronization based on points D₁ andD₂ associated with at least one characteristic video parametercorresponding to said video stream and secondly audio synchronizationbased on points D″1 and D″2 associated with at least one characteristicaudio parameter corresponding to said audio stream.

It may then include a step of determining the synchronization shiftbetween the video stream and the audio stream as the difference betweensaid shifts determined for the video stream and for the audio stream.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features and advantages of the invention will become more apparenton reading the description with reference to the appended drawings, inwhich:

FIG. 1 shows the architecture of a prior art system for measuring thequality of an audio signal;

FIG. 2 depicts the audio signal synchronization problem;

FIG. 3 shows an increase in synchronization accuracy that may beachieved in the context of the present invention;

FIG. 4 depicts an example of two bidimensional trajectories of audiosignals to be synchronized in a situation where r=Δ/2;

FIGS. 5 and 6 depict two variants of synchronization between twotrajectories assigned to two data streams;

FIG. 7 is a flowchart of a trajectory-based synchronization method ofthe invention;

FIGS. 8 to 10 depict synchronization in accordance with the inventionwhen the significant parameter is a perceived audio parameter, FIGS. 10a and 10 b respectively depicting the situation before and aftersynchronization of two trajectories; and

FIG. 11 depicts a use of a method employing autoregressive modeling ofthe signal with linear prediction coefficients as the characteristicparameter.

MORE DETAILED DESCRIPTION

The first step of the method calculates at least two characteristicnumbers from one or more characteristic parameters over all of the timewindows of the signals to be synchronized and over the requiredsynchronization period; each number is therefore calculated at intervalsΔ (see FIGS. 2 and 3), which yields N=T/Δ parameters. If possible, thenumber(s) must be simple to calculate, so as not to demand excessivecalculation power. Each characteristic parameter may be of any kind andmay be represented by a single number, for example. One characteristicparameter of the content of an audio signal is the bandwidth, forexample.

Providing the parameters only at intervals A greatly reduces thequantity of data necessary to obtain synchronization from the referencesignal S_(R). However, the accuracy of the resulting synchronization isnecessarily limited; the uncertainty with respect to an idealsynchronization, i.e. to the nearest signal sample, is ±Δ/2. If thisuncertainty is too great, one alternative is to reduce the period Δ;however, this modification is rarely possible since it calls intoquestion the calculation of the characteristic number(s) and increasesthe quantity of data necessary for synchronization.

In the particular embodiment in which the parameters are also used toevaluate quality by comparing the parameters P₁ and P′₁, anysynchronization error exceeding the resolution r₀ of the parameter willprevent estimation of the deterioration introduced (this is Situation Ain FIG. 3).

To obtain an arbitrary synchronization accuracy, with an uncertaintyvalue r that may be less than Δ/2, for example, without increasing thequantity of data extracted from the reference signal, the characteristicnumbers may be calculated with a higher temporal resolution. For thispurpose, the parameters are calculated at intervals r<Δ from the secondsignal to be synchronized (the “degraded” signal), which corresponds toΔ/r parameters P₁ ^(i) for a parameter P₁. The calculation complexityincreases from T/Δ to T/r calculation windows, but only for the receivedsignal. The situation B of FIG. 3 illustrates the method used. Forexample, r is a sub-multiple of Δ.

Notation

-   -   T: synchronization search period (T is a multiple of Δ);    -   r₀: maximum permitted synchronization error/uncertainty;    -   e: synchronization error;    -   Δ: period of calculating the parameters from the signal;    -   P_(k): parameter calculated from the first (“reference”) signal        S_(R)(k is a temporal index indicating to which calculation        period Δ P_(k) corresponds);    -   P′_(k): parameter calculated from the second (“degraded”) signal        S_(D)(k is a temporal index indicating to which calculation        period Δ P_(k) corresponds);    -   P′_(k) ^(i): parameter calculated from the second (“degraded”)        signal S_(D)(k is a temporal index indicating to which        calculation period ≢ P_(k) corresponds); and    -   i is a temporal subindex indicating a number of periods r from 1        to Δ/r within the period p.

Note: All durations correspond to an integer number of samples of theaudio or video signal.

The second step processes the parameters to define one or morecoordinates. A set of β coordinates is calculated for each set ofparameters P_(k) or P′_(k) ^(i) obtained over the window k of duration Acorresponding to 1024 samples of the reference signal or the degradedsignal, respectively, for example.

The prime aim of this step is to obtain pertinent coordinate values forcarrying out synchronization, with given bounds and limits. Thus eachcoordinate is obtained from a combination of available characteristicnumbers. Moreover, this step reduces the number of dimensions andtherefore simplifies subsequent operations.

In one preferred embodiment, two coordinates must be obtained (β=2). Forexample, if two characteristic parameters are used, each of them may beused to determine a coordinate. Alternatively, more characteristicnumbers may be used; processing may be carried out to provide fewernumbers, for example two coordinates, which are then interpreted as aprojection from a space with as many dimensions as there arecharacteristic numbers to a space with two coordinates, for example.

The third step constructs the trajectory (see FIG. 4). The trajectorydefines a signature of a segment of the audio signal over the duration Tby means of a series of points in a space with as many dimensions asthere are coordinates. The use of a space with two or more dimensionsenables a particular trajectory to be constructed, achieving highreliability and high accuracy of synchronization.

After these three steps, synchronizing the signals amounts tosynchronizing two trajectories (or curves parametered by time) in aspace of two or more dimensions:

-   -   The first trajectory is defined by points R_(k) obtained from        significant numbers P_(k) calculated at intervals Δ over the        time period T. There are N=T/Δ points R_(k).    -   The second trajectory is defined by points D_(k)=D_(k) ^(i)        obtained from significant numbers P_(k)=P_(k) ^(i) calculated at        intervals Δ over the range T. There are N′=N=T/Δ points D_(k).

If a period r<Δ is used to calculate the parameters P′^(k) ^(i), thetrajectory is defined by the points D_(k) ^(i), of which there areN′=T/r.

To this end, a criterion of resemblance between two trajectories of Npoints (or of N and N′ points) is used. The following methods aredescribed by way of example:

The first method proposed minimizes a distance between the twotrajectories.

The basic idea is to calculate a distance over a portion of thetrajectory. An appropriate portion of each trajectory is selected as afunction of the maximum range of desynchronization of the curvescorresponding to the audio or video signals.

Over these portions, a cumulative total Diff of the distances d betweenthe peaks R_(k) and D_(k+delta) or D_(k+delta) of the curves iscalculated from equations (1) and (2) below, respectively, by applyingsuccessive shifts delta, in order to find the shift minimizing thedistance Diff between trajectories.

FIG. 4 depicts the calculation for one example, with points defined bytwo coordinates in a space with β=2 dimensions. For the “degraded”signal, the parameters are calculated at intervals r=Δ/2, i.e. withtwice the resolution of the first signal.

The distance Diff gives the distance between the two trajectories. Thearithmetic mean of the peak to peak distances is preferred, but anotherdistance calculation is equally applicable.${{Diff}({delta})} = \sqrt[{\alpha\quad D}]{\frac{1}{N}{\sum\limits_{k = 1}^{N}\lbrack {d( {D_{k},R_{k + {delta}}} )} \rbrack}}$where αD=1 . . . ∝, N=T/Δand d(A,B) is the distance between two pointsor peaks. This distance d(A,B) may also have any value. In oneparticular embodiment, the Euclidean distance is used:${d( {A,B} )} = \sqrt[{\alpha\quad d}]{\sum\limits_{j = 1}^{\beta}( {a_{j} - b_{j}} )^{\alpha\quad d}}$where αd=1 . . . ∝, a_(j) and b_(j) are the coordinates of the points Aand B and β designates the number of coordinates of each point.

The shift delta giving the minimum distance Diff corresponds toresynchronization of the curves and consequently of the original signal.In this example (FIG. 4) the shift is 2, which is twice the initialparameter calculation period Δ. The synchronization range will thereforebe from: $\begin{matrix}{t + {2*\Delta} - {\frac{\Delta}{2}\quad{to}\quad t} + {2*\Delta} + \frac{\Delta}{2}} & (3)\end{matrix}$

The second criterion proposed is maximization of a correlation betweenthe two trajectories.

This criterion works in a similar way to the preceding one, except thatit maximizes the value Correl. Equations (1) and (2) are replaced by thefollowing two equations: $\begin{matrix}{{{Correl}({delta})} = {\sum\limits_{k = 1}^{N}{D_{k}*R_{k + {delta}}}}} & (4)\end{matrix}$

-   -   in which the operator * denotes the scalar product defined as        follows: $\begin{matrix}        {{A*B} = \frac{\sum\limits_{k = 1}^{N}{a_{j}*b_{j}}}{\sqrt{\sum\limits_{k = 1}^{N}a_{j}^{2}} \cdot \sqrt{\sum\limits_{k = 1}^{N}b_{j}^{2}}}} & (5)        \end{matrix}$        where a_(j) and b_(j) are the coordinates of the points A and B.

The following methods are particularly suitable for β=2 coordinates.

Other techniques make the method more robust in the presence ofsignificant differences between the signals to be synchronized, forexample caused by deterioration during broadcasting, namely:

-   -   distance between successive angles of the trajectories

This method consists in transforming the two-dimensional trajectory intoa series of angles between successive segments defined by the points ofthe trajectory. FIG. 5 shows the definition of the angles Δφ.

The criterion used for synchronizing the two trajectories isminimization of the following equation: $\begin{matrix}{{{Diff}({delta})} = {\sum\limits_{k = 1}^{N - 1}{{\varphi_{k} - \varphi_{k + {delta}}}}}} & (6)\end{matrix}$

-   -   intercepted area between the two curves

This method consists in transforming the two-dimensional trajectory intoa series of areas intercepted by successive segments defined by thepoints of the trajectory. FIG. 6 shows the definition of the interceptedareas S.

The criterion used for synchronizing the two trajectories isminimization of the following equation: $\begin{matrix}{{S_{Total} = {{{sum}\quad S\quad{{Diff}({delta})}} = {\sum\limits_{k = 1}^{N - 1}{S_{k,{k + {delta}}}}}}}\quad} & (7)\end{matrix}$

Finally, the simultaneous use of a plurality of criteria is possible.Once the value delta of the resynchronization between the two signalshas been determined by one of the above methods, the two signals may beresynchronized by applying the shift delta to one of the signals.Synchronization is obtained to an accuracy determined by the rate atwhich the characteristic numbers are calculated.

FIG. 7 is a flowchart of a synchronization method.

If the required accuracy is not achieved, i.e. if the synchronization istoo “coarse” for the target application, there may be a final step torefine the preceding result.

A prior art procedure may be applied to the synchronization uncertaintyrange A or r, which is now sufficiently small for the complexity to beacceptable. For example, an approach based on correlation in the timedomain may be used, preferably an approach that uses marker signals.

However, this step should be used only in certain specific instancesbecause, in the quality measurement type of target application, refiningthe synchronization is generally not necessary since sufficient accuracyis achieved. Moreover, as explained above, the prior art techniquesnecessitate the availability of data on the signals that is not readilytransportable in a complex and distributed system.

One particular embodiment of the invention relates to an application formonitoring audio quality in a digital television broadcast network. Inthis context, a major benefit of the invention is that it achievessynchronization using data used for evaluating quality, as this avoidsor minimizes the need to transmit data specific to synchronization.

Diverse characteristic numbers for estimating the magnitude of thedeterioration introduced on broadcasting the signal are calculated fromthe reference signal at the input of the network (this refers to“reduced reference” methods). The reference numbers P_(R) are sent overa data channel to the quality measurement point, characteristic numbersP_(M) are calculated from the degraded signal at the measurement point,and quality is estimated by comparing the parameters P_(R) and P_(M).They must be synchronized for this, on the basis of the characteristicparameter(s) used for the reference.

Quality is therefore estimated by comparing the parameters P_(R) andP_(M), which must be synchronized for this to be possible.

The principle of objective perceived measurements is based on convertinga physical representation (sound pressure level, level, time andfrequency) into a psychoacoustic representation (sound force, maskinglevel, critical times and bands or barks) of two signals (the referencesignal and the signal to be evaluated), in order to compare them. Thisconversion is effected by modeling the human auditory apparatus(generally by spectral analysis in the Barks domain followed byspreading phenomena).

The following embodiment of the method of the invention uses a perceivedcharacteristic parameter known as the “perceived count error”. Thenovelty of this parameter is that it establishes a measurement of theuniformity of a window in the audio signal. A sound signal whosefrequency components are stable is considered to be uniform. Conversely,“perfect” noise corresponds to a signal that covers all the frequencybands uniformly (flat spectrum). This type of parameter may therefore beused to characterize the content of the signal. This capacity isreinforced by its perceived character, i.e. by taking account ofcharacteristics of the human auditory apparatus known frompsychoacoustics.

The steps applied to the reference signal and to the degraded signal totake account of psychoacoustics are as follows:

-   -   Windowing of the temporal signal in blocks and then, for each        block, calculating the excitation induced by the signal using a        hearing model. This representation of the signals takes account        of psychoacoustic phenomena and supplies a histogram whose        counts are basilar component values. Thus only the audible        components of the signal need to be taken into account, i.e.        only the useful information. Standard models may be used to        obtain this excitation: attenuation of the external and middle        ear, integration in physical bands and frequency masking. The        time windows chosen are of approximately 42 ms duration (2048        points at 48 kHz), with a 50% overlap. This achieves a temporal        resolution of the order of 21 ms.

Modeling entails a plurality of steps. In the first step, theattenuation filter of the external and middle ear is applied to thespectral power density obtained from the spectrum of the signal. Thisfilter also takes account of an absolute hearing threshold. The conceptof critical bands is modeled by conversion from a frequency scale to abasilar scale. The next step calculates individual excitations to takeaccount of masking phenomena, using the spreading function in thebasilar scale and non-linear addition. The final step uses a powerfunction to obtain the compressed loudness for modeling the non-linearfrequency sensitivity of the ear by a histogram comprising 109 basilarcomponents.

The counts of the histogram obtained are then periodically vectored inthree classes to obtain a representation along a trajectory that is usedto visualize the evolution of the structure of the signals and forsynchronization. This also yields a simple and concise characterizationof the signal and thus provides a reference parameter (or characteristicparameter).

There are various strategies for fixing the limits of the three classes;the simplest divides the histogram into three areas of equal size. Thusthe 109 basilar components, which represent 24 Barks, may be separatedat the following indices: $\begin{matrix}{{IS}_{1} = {{36\quad{i.e.\quad z}} = {{\frac{24}{109}*36} = {7.927\quad{Barks}}}}} & (8) \\{{IS}_{2} = {{73\quad{i.e.\quad z}} = {{\frac{24}{109}*73} = {16.073\quad{Barks}}}}} & (9)\end{matrix}$

The second strategy takes account of the BEERENDS scaling areas. Thiscorresponds to compensation of the gain between the excitation of thereference signal and that of the signal under test by considering threeareas in which the ear would perform this same operation. Thus thelimits set are as follows: $\begin{matrix}{{IS}_{1} = {{9\quad{i.e.\quad z}} = {{\frac{24}{109}*9} = {1.982\quad{Barks}}}}} & (10) \\{{IS}_{2} = {{100\quad{i.e.\quad z}} = {{\frac{24}{109}*100} = {22.018\quad{Barks}}}}} & (11)\end{matrix}$

The trajectory is then represented in a triangle known as the frequencytriangle. For each block three counts C₁, C₂ and C₃ are obtained, andthus two Cartesian coordinates, conforming to the following equations:$\begin{matrix}{X = {{C_{1}/N} + \frac{C_{2}/N}{2}}} & (12)\end{matrix}$  Y=C ₂ /N*sin(π/3)  (13)

-   -   where C₁ is the sum of the excitations for the high frequencies        (components above S₂),    -   C₂ is the count associated with the medium frequencies        (components from S₁ to S₂), and    -   N=C₁+C₂+C₃ is the total sum of the values of the components.

A point (X, Y) is therefore obtained for each temporal window of thesignal. Each of the coordinates X and Y constitutes a characteristicnumber. Alternatively, C₁, C₂ and C₃ may be taken as characteristicnumbers.

For a complete sequence, the associated representation is therefore atrajectory parametered by time, as shown in FIG. 8.

Of the various methods available for synchronizing the trajectories, thetechnique chosen by way of example is that based on minimizing thedistance between points on the trajectories.

It is important to note that the calculation of the parameter for thesynchronization used in this case remains complex, but that thisparameter may also be used to estimate the quality of the signal. Itmust therefore be calculated anyway, and this is therefore not anadditional calculation load at the time of the comparison, especially asthe calculation relating to this parameter is effected locally only forthe received digital stream.

FIG. 9 summarizes the method used to synchronize the signals in thecontext of monitoring the quality of broadcast signals using the abovecharacteristic parameter.

The following example illustrates the case of a reference file (R1)which is MPEG2 coded and decoded at 128 kbit/s, yielding a degraded file(R2). The resynchronization introduced is 6000 samples. The shift foundis six windows, i.e. 6*1024=6144 samples. The error (144) is much lessthan the period (1024) of the characteristic parameter. FIGS. 10 a and10 b show the trajectories before and after synchronization.

Before synchronization (FIG. 10 a), there is no point to pointcorrespondence between the two trajectories. After synchronization (FIG.10 b), the correspondence between the two trajectories is optimized interms of the distance criterion (cf. equation (1)).

More refined synchronization is generally not needed, especially if theuncertainty resulting from the procedure explained here is less than themaximum synchronization error permitted by the quality measurementparameter. For more demanding quality parameters, the necessaryresolution r₀ is of the order of 32 samples.

In FIG. 10 a, the original range is of the order of 120 ms, i.e. 5760samples at 48 kHz. Using only the characteristic numbers available forthe evaluation of quality (every 1024 samples, i.e. every Δ), a firstsynchronization is carried out with an uncertainty of 1024 samples,which is better by a factor of 5 compared to 5760, for a calculationpower dedicated to very limited synchronization.

However, in a second step, for example, more frequent calculation of thequality parameters for the second (degraded) signal (r<Δ) enables thesynchronization error to be further reduced to r samples, if required.

Another characteristic parameter uses autoregressive modeling of thesignal.

The general principle of linear prediction is to model a signal as acombination of its past values. The basic idea is to calculate the Ncoefficients of a prediction filter by autoregressive (all pole)modeling. It is possible to obtain a predicted signal from the realsignal using this adaptive filter. The prediction or residual errors arecalculated from the difference between these two signals. The presenceand the quantity of noise in a signal may be determined by analyzingthese residues.

The magnitude of the modifications and defects introduced may beestimated by comparing the residues obtained for the reference signaland those calculated from the degraded signal.

Because there is no benefit in transmitting all of the residues if thebit rate of the reference is to be reduced, the reference to betransmitted corresponds to the maximum of the residues over a timewindow of given size.

Two methods of adapting the coefficients of the prediction filter aredescribed hereinafter by way of example:

-   -   The LEVINSON-DURBIN algorithm, which is described, for example,        in “Traitement numerique du signal—Théorie et pratique”        [“Digital signal processing—Theory and practice”] by M.        BELLANGER, MASSON, 1987, pp. 393 to 395. To use this algorithm,        an estimate is required of the autocorrelation of the signal        over a set of No samples. This autocorrelation is used to solve        the Yule-Walker system of equations and thus to obtain the        coefficients of the prediction filter. Only the first N values        of the autocorrelation function may be used, where N designates        the order of the algorithm, i.e. the number of coefficients of        the filter. The maximum prediction error is retained over a        window comprising 1024 samples.

The gradient algorithm, which is also described in the above-mentionedbook by M. BELLANGER, for example, starting at page 371. The maindrawback of the preceding parameter is the necessity, in the case of aDSP implementation, to store the No samples in order to estimate theautocorrelation, together with the coefficients of the filter, and thento calculate the residues. The second parameter avoids this by usinganother algorithm to calculate the coefficients of the filter, namelythe gradient algorithm, which uses the error that has occurred to updatethe coefficients. The coefficients of the filter are modified in thedirection of the gradient of the instantaneous quadratic error, with theopposite sign.

When the residues have been obtained from the difference between thepredicted signal and the real signal, only the maximum of their absolutevalues over a time window of given size T is retained. The referencevector to be transmitted can therefore be reduced to a single number.

After transmission followed by synchronization, comparison consists insimply calculating the distance between the maxima of the reference andthe degraded signal, for example using a difference method.

FIG. 5 summarizes the parameter calculation principle:

The main advantage of the two parameters is the bit rate necessary fortransferring the reference. This reduces the reference to one realnumber for 1024 signal samples.

However, no account is taken of any psychoacoustic model.

Another characteristic parameter uses autoregressive modeling of thebasilar excitation.

In contrast to the standard linear prediction method, this method takesaccount of psychoacoustic phenomena in order to obtain an evaluation ofperceived quality. For this purpose, calculating the parameter entailsmodeling diverse hearing principles. Linear prediction models the signalas a combination of its past values. Analysis of the residues (orprediction errors) determines the presence of noise in a signal andestimates the noise. The major drawback of these techniques is that theytake no account of psychoacoustic principles. Thus it is not possible toestimate the quantity of noise actually perceived.

The method uses the same general principle as standard linear predictionand additionally integrates psychoacoustic phenomena in order to adaptto the non-linear sensitivity of the human ear in terms of frequency(pitch) and intensity (loudness).

The spectrum of the signal is modified by means of a hearing modelbefore calculating the linear prediction coefficients by autoregressive(all pole) modeling. The coefficients obtained in this way provide asimple way to model the signal taking account of psychoacoustics. It isthese prediction coefficients that are sent and used as a reference forcomparison with the degraded signal.

The first part of the calculation of this parameter modelspsychoacoustic principles using the standard hearing models. The secondpart calculates linear prediction coefficients. The final part comparesthe prediction coefficients calculated for the reference signal andthose obtained from the degraded signal. The various steps of thismethod are therefore as follows:

-   -   Time windowing of the signal followed by calculation of an        internal representation of the signal by modeling psychoacoustic        phenomena. This step corresponds to the calculation of the        compressed loudness, which is in fact the excitation in the        inner ear induced by the signal. This representation of the        signal takes account of psychoacoustic phenomena and is obtained        from the spectrum of the signal, using the standard form of        modeling: attenuation of the external and middle ear,        integration in critical bands, and frequency masking; this step        of the calculation is identical to the parameter described        above;    -   Autoregressive modeling of the compressed loudness in order to        obtain the coefficients of an RIF prediction filter, exactly as        in standard linear prediction; the method used is that of        autocorrelation by solving the Yule-Walker equations; the first        step for obtaining the prediction coefficients is therefore        calculating the autocorrelation of the signal.

It is possible to calculate the perceived autocorrelation of the signalusing an inverse Fourier transform by considering the compressedloudness as a filtered spectral power.

One method of solving the Yule-Walker system of equations and thus ofobtaining the coefficients of a prediction filter uses theLevinson-Durbin algorithm.

It is the prediction coefficients that constitute the reference vectorto be sent to the comparison point. The transforms used for the finalcalculations on the degraded signal are the same as are used for theinitial calculations applied to the reference signal.

-   -   Estimating the deterioration by calculating a distance between        the vectors from the reference and from the degraded signal.        This compares coefficient vectors obtained for the reference and        for the transmitted audio signal, enabling the deterioration        caused by transmission to be estimated, using an appropriate        number of coefficients. The higher this number, the more        accurate the calculations, but the greater the bit rate        necessary for transmitting the reference. A plurality of        distances may be used to compare the coefficient vectors. The        relative size of the coefficients may be taken into account, for        example.

The principle of the method may be as summarized in the FIG. 11 diagram.

Modeling psychoacoustic phenomena yields 24 basilar components. Theorder N of the prediction filter is 32. From these components, 32autocorrelation coefficients are estimated, yielding 32 predictioncoefficients, of which only 5 to 10 are retained as a quality indicatorvector, for example the first 5 to 10 coefficients.

The main advantage of this parameter is that it takes account ofpsychoacoustic phenomena. To this end, it has been necessary to increasethe bit rate needed to transfer the reference consisting of 5 or 10values for 1024 signal samples (21 ms for an audio signal sampled at 48kHz), that is to say a bit rate of 7.5 to 15 kbit/s.

The characteristic parameter P may generally be any magnitude obtainedfrom the content of the digital signals, for example, in the case ofvideo signals:

-   -   the brightness of the image or of an area thereof as given by        the continuous coefficients F(0,0) of the discrete cosine        transform of the image, or any other transform by blocks, linear        and orthogonal, by blocks or global, and/or    -   the contrast of the image or of an area thereof, obtained by        applying a Sobel filter, for example, and/or    -   the activity SA of the image as defined, for example, in the        Applicant's application PCT WO 99/18736, and obtained by a        transformation by blocks linear and orthogonal (discrete cosine        transform, Fourier transform, Haar transform, Hadamard        transform, slant transform, wavelet transform, etc.),    -   the average of the image,    -   and in the case of audio signals:    -   the power, and/or    -   the spectral power density as defined in French Patent        Application FR 2 769 777 filed 13 Oct. 1997, and/or one of the        parameters described above.

It will be noted that the parameter P may be degraded by transmission,but in practice it is found that synchronization may be obtained by themethod of the invention at the levels of deterioration generallyencountered in transmission networks.

As a general rule, once synchronization has been acquired, the methodmay be used to verify that it has been retained, in order to be able toremedy disturbances such as bit stream interruptions, changes of bitstream, changes of decoder, etc., as and when required, bydesynchronizing the two digital signals E and S.

The method described is applicable whenever it is necessary tosynchronize two digital streams. The method yields a firstsynchronization range that is sufficiently narrow to allow the use ofstandard real time fine synchronization methods.

The method advantageously exploits one or more parameters characteristicof the signals to be synchronized that are represented by at least twocharacteristic numbers, instead of all of the signals. In a preferredembodiment, the combined use of a plurality of parameters achieves morereliable synchronization than the prior art techniques. Moreover, theinvention achieves synchronization at a chosen level of accuracy andwith less complexity than existing methods. This form of synchronizationdelimits an error range with a duration allowing subsequent use ofstandard “fine” synchronization methods if higher accuracy is required.

One particular application of measuring equipment for implementing themethod of the invention is monitoring the quality of signals deliveredby audiovisual digital signal broadcasting networks.

The invention also provides sound and picture synchronization for a datastream incorporating audio and video data. To this end, videosynchronization is effected by calculating a video synchronization shiftand audio synchronization is effected by calculating an audiosynchronization shift. Moreover, it is possible to determine if anoffset between the sound and the picture has occurred duringtransmission by comparing the values of the two shifts, for example.

1. A method of synchronizing two digital data streams with the samecontent, the method comprising the steps of: a) generating at givenintervals for each of the two digital data streams S₁ and S₂ at leasttwo characteristic numbers expressing at least one parametercharacteristic of their content; b) generating from said numbers pointsD₁ and D₂ for each of the two streams S₁ and S₂ representing at leastone of said characteristic parameters in a space of at least twodimensions, the points D1 corresponding to the stream S1 and the pointsD2 corresponding to the stream S2 being situated in a time period T anddefining trajectories representative of the data streams S_(1 and S) ₂to be synchronized; c) shifting the time periods of duration T assignedto the digital data streams S_(1 and S) ₂ relative to each other bycalculating a criterion of superposition of said trajectories having anoptimum value representing the required synchronization; d) choosing theshift between the time periods corresponding to said optimum value as avalue representative of the synchronization.
 2. A method according toclaim 1, wherein one of the digital data streams is a reference streamS₁, the other data stream is a stream S₂ received via a transmissionsystem, the numbers characteristic of the reference stream S₁ aretransmitted therewith, and the numbers characteristic of the receivedstream S₂ are calculated in the receiver.
 3. A method according to claim1, wherein the step c) entails: c1) calculating a distance D between afirst trajectory represented by the points D₁ belonging to a first timeperiod of duration T and a second trajectory represented by the pointsD₂ belonging to a second time period of duration T, said distance Dconstituting said superposition criterion; and c2) shifting said firstand second time periods of duration T relative to each other until aminimum value is obtained for the distance D that constitutes saidoptimum value.
 4. A method according to claim 3, wherein the distance Dis an arithmetic mean of the distances d between corresponding pointsD₁, D₂ of the two trajectories.
 5. A method according to claim 4,wherein said distance d between the points is a Euclidean distance.
 6. Amethod according to claim 1, wherein the step c) entails: c1)calculating a correlation function between corresponding points D₁, D₂of the two trajectories, said correlation function constituting saidsuperposition criterion; and c2) shifting said first and second timeperiods of duration T relative to each other until a minimum value ofthe correlation function is obtained that constitutes said optimumvalue.
 7. A method according to claim 1, wherein the step c) entails:c1) converting each trajectory into a series of angles betweensuccessive segments defined by points on the trajectory; and c2)shifting said first and second time periods of duration T relative toeach other until a minimum value is obtained for the differences betweenthe values of angles obtained for homologous segments of the twotrajectories, said minimum value constituting said optimum value.
 8. Amethod according to claim 1, wherein the step c) entails: c1) convertingthe two trajectories into a series of areas intercepted by successivesegments defined by points on said trajectories, the total interceptedarea constituting said superposition criterion; and c2) shifting thetime periods of duration T relative to each other until a minimum valueis obtained of said total intercepted area, which minimum valueconstitutes said optimum value.
 9. A method according to claim 1,wherein one of said given intervals is equal to Δ for one of the datastreams S₁ and equal to r<Δ for the other data stream S₂.
 10. A methodaccording to claim 1, effecting a first synchronization by choosing thesame given interval Δ for the two data streams and then a secondsynchronization for which the given interval is equal to Δ for one ofthe data streams S₁ and equal to r<Δ for the other data stream S₂.
 11. Amethod according to claim 1, wherein the generation of saidcharacteristic numbers for a reference audio data stream and for atransmitted audio data stream comprises the following steps: a)calculating for each time window the spectral power density of the audiostream and applying to it a filter representative of the attenuation ofthe inner and middle ear to obtain a filtered spectral density; b)calculating individual excitations from the filtered spectral densityusing the frequency spreading function in the basilar scale; c)determining the compressed loudness from said individual excitationsusing a function modeling the non-linear frequency sensitivity of theear, to obtain basilar components; and d) separating the basilarcomponents into n classes, where n is much lower than the number ofaudio samples in a time window, and preferably into three classes, andcalculating for each class a number C representing the sum of thefrequencies of that class, there being either n characteristic numbersconsisting of a number C or n′<n characteristic numbers generated fromsaid numbers C.
 12. A method according to claim 1, wherein thegeneration of a characteristic number for a reference audio data streamand for a transmitted audio data stream comprises the following steps:a) calculating N coefficients of a prediction filter by autoregressivemodeling; and b) determining in each time window the maximum value ofthe residue as the difference between the signal predicted by means ofthe prediction filter and the audio signal, said maximum predictionresidue value constituting one of said characteristic numbers.
 13. Amethod according to claim 1, wherein the generation of saidcharacteristic numbers for a reference audio data stream and for atransmitted audio data stream comprises the following steps: a)calculating for each time window the spectral power density of the audiostream and applying to it a filter representative of the attenuation ofthe inner and middle ear to obtain a frequency spreading function in thebasilar scale; b) calculating individual excitations from the frequencyspreading function in the basilar scale; c) obtaining the compressedloudness from said individual excitations using a function modeling thenon-linear frequency sensitivity of the ear, to obtain basilarcomponents; d) calculating from said basilar components N′ predictioncoefficients of a prediction filter by autoregressive modeling; and e)generating at least one characteristic number for each time window fromat least one of the N′ prediction coefficients.
 14. A method accordingto claim 13, wherein the characteristic numbers consist of 1 to 10 ofsaid prediction coefficients and preferably 2 to 5 of said coefficients.15. A method according to claim 1, wherein one characteristic number isthe instantaneous power of an audio signal.
 16. A method according toclaim 1, wherein one characteristic number is the spectral power densityof an audio signal.
 17. A method according to claim 1, wherein onecharacteristic number is the bandwidth of an audio signal.
 18. A methodaccording to claim 1, wherein one characteristic number is thecontinuous coefficient of the transformation by a linear and orthogonaltransform of at least one portion of an image belonging to the datastream, said transformation being effected by blocks or globally.
 19. Amethod according to claim 1, wherein one characteristic number is thecontrast of at least one area of an image belonging to the data stream.20. A method according to claim 1, wherein one characteristic number isthe spatial or temporal activity of at least one area of an image.
 21. Amethod according to claim 1, wherein one characteristic number is theaverage brightness of at least one area of an image.
 22. A methodaccording to claim 1, wherein said points D_(1 and D) ₂ are generatedfrom at least two characteristic parameters.
 23. A method according toclaim 22, wherein said characteristic parameters are audio parameters.24. A method according to claim 22, wherein said characteristicparameters are video parameters.
 25. A method according to claim 1,wherein the data stream comprises video data and audio data and themethod effects firstly video synchronization based on points D₁ and D₂associated with at least one characteristic video parametercorresponding to said video stream and secondly audio synchronizationbased on points D″1 and D″2 associated with at least one characteristicaudio parameter corresponding to said audio stream.
 26. A methodaccording to claim 25, including a step of determining thesynchronization shift between the video stream and the audio stream asthe difference between said shifts determined for the video stream andfor the audio stream.